{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1B. Data Cleaning: Seasonality\n",
    "<hr>\n",
    "\n",
    "### Overall Goal:\n",
    "The goal of this part of the cleaning is to get the dataset in a usable format. There were various challenges due to the csv containing commas in the pricing column, thus creating a problem with the number of columns for select entries (where the price is 1,000+). A lot of time had to be spent to recreating the csv in a usable format. Here we do this by using various strip and string concatenate functions and then outputting the correct file. Lastly, we see only a small subset of date don't have prices. It doesn't make much sense to predict those seasonality prices and then run a seasonality analysis. Given it is a small number of the overall entries, we drop the null values as so it doesn't cause errors with our seasonality analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import matplotlib\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.cm as cmx\n",
    "import matplotlib.colors as colors\n",
    "import numpy as np\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>listing_id</th>\n",
       "      <th>date</th>\n",
       "      <th>available</th>\n",
       "      <th>new price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3604481.0</td>\n",
       "      <td>1/1/2015</td>\n",
       "      <td>t</td>\n",
       "      <td>$600.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3604481.0</td>\n",
       "      <td>1/2/2015</td>\n",
       "      <td>t</td>\n",
       "      <td>$600.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3604481.0</td>\n",
       "      <td>1/3/2015</td>\n",
       "      <td>t</td>\n",
       "      <td>$600.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3604481.0</td>\n",
       "      <td>1/4/2015</td>\n",
       "      <td>t</td>\n",
       "      <td>$600.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3604481.0</td>\n",
       "      <td>1/5/2015</td>\n",
       "      <td>t</td>\n",
       "      <td>$600.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   listing_id      date available new price\n",
       "0   3604481.0  1/1/2015         t   $600.00\n",
       "1   3604481.0  1/2/2015         t   $600.00\n",
       "2   3604481.0  1/3/2015         t   $600.00\n",
       "3   3604481.0  1/4/2015         t   $600.00\n",
       "4   3604481.0  1/5/2015         t   $600.00"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.read_csv('../datasets/raw_datasets/calendar.csv')\n",
    "data['Unnamed: 4'].fillna(value='',inplace=True)\n",
    "price_strip = data['price'].str.strip()\n",
    "data['new price']= price_strip.map(str)+data['Unnamed: 4'].map(str)\n",
    "data[data['new price']=='nan']=np.NaN\n",
    "data_fill = data.dropna()\n",
    "data_fill = data_fill.drop('price', 1)\n",
    "data_fill.drop(\"Unnamed: 4\",axis=1,inplace=True)\n",
    "data_fill.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "source": [
    "We see that everything looks great! We export it to the file calendar_clean.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "data_fill.to_csv(\"../datasets/calendar_clean.csv\",index=False)"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [Root]",
   "language": "python",
   "name": "Python [Root]"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
